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Summary 


Thirteen  subjects  participated  in  an  auditory  simulation  of  a  passive  sonar  target  detection 
environment.  Targets  were  300  ms  noise  bursts  presented  at  near  threshold  levels  in  a  noise 
background  at  a  mean  rate  of  10  per  minute.  Task-irrelevant  probe  tones  were  also  presented 
at  inter-stimulus  intervals  of  2-4  seconds.  Each  subject  participated  in  two  28  minute  test  ses¬ 
sions,  pressing  a  button  whenever  they  detected  a  noise  target.  Prominent  minute-scale  fluctua¬ 
tions  in  performance  (computed  as  changes  in  local  error  rate  using  a  32-s  moving  window) 
occurred  in  many  of  the  sessions.  Evoked  responses  to  the  irrelevant  probe  tones  in  thirteen 
runs  with  highest  number  of  performance  lapses  were  sorted  by  current  local  error  rate  and 
smoothed  using  a  moving-average.  The  amplitude  of  the  grand  mean  .\2  response  to  the 
irrelevant  probe  tones  increased  monotonically  with  error  rate.  Averaged  evoked  responses  to 
relatively  frequent,  task-irrelevant  probe  tones  appear  to  allow  an  accurate  estimate  of  level  of 
alertness  if  adequate  numbers  of  trials  are  available. _ 
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Introduction 


Since  the  experiments  of  Mackwonh  (1948),  decrements  in  performance  on  continuous 
supra-threshold  detection  tasks  have  been  referred  to  as  "vigilance  decrements."  Mackwonh’s 
results  supported  his  recommendation  that  operators  performing  tasks  which  require  sustained 
vigilance  should  be  relieved  after  30  minutes  on  task.  However,  in  such  experiments  mean 
trends  in  performance  across  sessions  may  bear  little  resemblance  to  fluctuations  in  respon- 
sivity  in  sessions.  After  monotonous  test  sessions,  subjects  may  recall  having  become  drowsy 
but  still  report  having  experienced  only  brief  periods  of  daydreaming  or  absentmindedness  and 
deny  having  dozed  at  all,  even  after  having  ceased  responding  altogether  for  several  minutes. 
Knowledge  of  mean  trends  in  performance,  however,  does  not  allow  accurate  prediction  of 
when  an  individual  operator  or  pilot  is  unable  to  detect  and  respond  to  inconspicuous  but 
important  signals  or  events.  Nonetheless,  relatively  little  research  has  been  concerned  with 
quantifying  and  studying  fluctuations  in  alertness  on  a  minute-to-minute  basis.  This  is  in  part 
because,  out  of  desire  to  simulate  real-life  work  environments,  most  vigilance  research  has 
used  target  presentation  rates  too  low  to  define  such  fluctuations  behaviorally. 

Since  Loomis’  original  observations  of  electroencephalographic  (EEG)  changes  in  sleep 
(Loomis,  Harvey,  and  Hobart,  1937),  researchers  have  imagined  the  possibility  of  developing 
an  electrophysiological  monitor  of  alertness.  But  while  relatively  successful  methods  have 
been  developed  for  automatically  categorizing  sleep  stages  using  combined  EEG  and  electro¬ 
oculogram  (EOG)  measures  (see  Goeller  and  Sinton,  1989),  the  problem  of  tracking  the  elec¬ 
trophysiological  signs  of  loss  of  alertness  due  to  transition  to  sleep  has  proven  difficult. 

In  such  research,  most  attention  has  been  given  to  changes  in  the  appearance  (Santamaria 
and  Chiappa,  1987),  topography  (Ulrich  and  Frick,  1986),  or  spectrum  (Townsend  and  John¬ 
son,  1979)  of  the  spontaneous  EEG  during  transition  to  sleep  as  defined  by  EEG  criteria. 
However,  several  reports  have  studied  electrophysiological  correlates  of  decline  in  alertness 
using  continuous  performance  measures.  Beatty,  Greenberg,  Deibler,  and  O’Hanlon  (1974) 
reported  that  periods  of  lowered  performance  on  a  visual  task  were  predicted  by  lowered  elec- 
trocortical  activation  as  indexed  by  a  simple  measure  of  occipital  theta  band  EEG  amplitude. 
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Matousek  and  Petersen  (1983),  using  a  linear  combination  of  over  thiny  EEG  band  and  band- 
ratio  amplitude  measures,  were  able  to  reproduce  the  visual  classification  of  EEG  epochs  into 
awake  and  Stage  I  sleep  states  by  visual  inspection.  Belyavin  and  Wright  (1987)  reported  that 
changes  in  amplitudes  in  theta  (4-7  Hz)  and  beta  (14-21  Hz)  EEG  bands  could  be  used  to 
predict  changes  in  performance  on  a  simple  vigilance  task,  while  delta  (1-3  Hz)  and  beta  band 
levels  were  best  predictors  of  fluctuations  in  performance  on  a  more  difficult  discrimination 
task.  Torsvall  and  Akerstedt  (1988)  observed  that  slow  eye  movements  and  a  large  increase 
in  alpha  EEG  precede  dozing  off  during  performance  of  a  simple  behavioral  task.  Ogilvie, 
Simons,  Kuderian,  MacDonald,  and  Rustenburg  (1991),  using  an  auditory  response  task  at  the 
beginning  of  all-night  sleep  sessions,  have  recently  claimed  that  power  in  all  EEG  bands 
increases  when  sleep-related  lapses  first  occur. 

Most  other  reports  have  studied  the  transition  from  waking  EEG  to  sleeping  EEG  using 
EEG  rather  than  behavioral  criteria  (see,  for  example,  Fruhstorfer  and  Bergstrom,  1969;  Pen- 
zel  and  Petzold,  1989).  However,  all  authors  agree  that  across  subjects,  recognized  EEG 
correlates  of  drowsiness  --  including  alpha  amplitude  increase  and  frontal  spread,  minute-scale 
spectral  variability,  slow  horizontal  eye  movements,  appearance  of  slow  wave  activity  and 
sleep  spindles  -  may  not  be  strongly  correlated  with  changes  in  performance.  Further,  across 
subjects  the  appearance  of  these  signs  varies  substantially  (Santamaria  and  Chiappa,  1987), 
limiting  the  potential  success  of  subject-independent  linetir  prediction  algorithms. 

Another  approach  to  alenness  monitoring  involves  measures  of  event-related  potentials 
(ERPs).  Several  endogenous  ERP  features  linked  to  cognitive  processes,  foremost  among  them 
the  P3()0,  are  well  known  to  index  the  allocation  of  attention  to  the  evoking  stimulus,  and 
therefore  fade  or  disappear  during  at  or  near  sleep  onset  (reviewed  in  Kramer,  1991).  How¬ 
ever,  cognitive  evoked  response  methods  in  general,  and  P300  recordings  in  particular,  require 
that  the  moments  of  occurrence  of  task-relevant  events  must  be  known  precisely.  Unfor¬ 
tunately,  in  many  settings  the  moments  of  occurrence  of  events  relevant  to  the  performance  of 
the  operator’s  task  can  neither  be  known  in  advance  nor  detected  automatically.  In  these 
environments,  therefore,  measures  of  the  P3(X)  evoked  following  target-relevant  signals,  or 
other  endogenous  potentials  evoked  by  task-relevant  signals  cannot  be  used  to  monitor 
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alertness  unless  artificial  secondary  tasks  and  task-related  stimuli  are  introduced.  But  these 
may  tend  to  focus  the  operator’s  attention  away  from  their  primary  task,  thus  reducing  rather 
than  enhancing  operator  performance. 

In  such  situations,  it  would  obviously  be  preferable  to  use  brain  responses  evoked  by 
task-irrelevant  signals,  if  they  were  known  to  signal  the  early  onset  of  performance  decre¬ 
ments.  In  this  approach,  stimuli  irrelevant  to  the  operator’s  task  would  be  delivered  to  opera¬ 
tors  periodically  to  probe  the  ability  of  the  central  nervous  system  to  respond  to  sensory 
stimulation.  Averaged  event-related  potentials  (ERPs)  to  task-irrelevant  probes  have  also  been 
shown  to  change  profoundly  in  sleep.  Early  auditory  ERP  studies  showed  that  a  large  N2 
wave  (peaking  circa  350  ms)  emerges  at  sleep  onset  and  dominates  sleeping  responses  (Weitz- 
man  and  Kremen,  1965;  Williams,  Tepas,  and  Morlock,  1962).  Omitz,  Ritvo,  Carr,  La  Fran- 
chi  and  Walter  (1967)  reported  that  N2  response  amplitude  is  greatest  within  5-10  minutes  of 
sleep  onset  (defined  by  the  first  appearance  of  sleep  spindles  in  the  EEG). 

Fruhstorfer  and  Bergstrom  (1969)  attempted  to  measure  changes  in  the  auditory  evoked 
response  during  sleep  transitions  in  more  detail.  They  used  a  midline  bipolar  electrode  mon¬ 
tage  and  presented  clicks  at  8-20  s  intervals  to  subjects  instructed  to  fall  asleep  and  pay  no 
attention  to  the  sounds  they  heard.  By  visually  inspecting  the  EEG  traces,  they  then  categor¬ 
ized  each  response  epoch  into  one  of  nine  stages  of  electroencephalographic  vigilance  accord¬ 
ing  to  the  criteria  of  Roth  (1961)  and  Bente  (1964).  They  reponed  that  the  click-evoked 
responses  peaks  N1  and  P2  decline  in  amplitude  with  decrease  in  EEG  vigilance  (i.e.,  appear¬ 
ance  of  EEG  changes  associated  with  transition  to  sleep),  and  that  a  new  N2  potential,  with  a 
shorter  latency  and  more  posterior  scalp  distribution,  appears  in  the  later  sub-stages.  Their 
figures  show  that  for  two  subjects,  this  sleep  N2  first  appeared  in  their  stage  B2,  characterized 
by  first  appearance  of  5-7  Hz  slow  waves.  For  the  other  four  subjects,  it  appeared  in  stage  C, 
characterized  by  mixed  3-7  Hz  and  12-14  Hz  rhythms.  These  stages  roughly  correspond  to 
stage  I  sleep  according  to  .standard  terminology  (Rechtschaffen  and  Kales,  1968).  However, 
Fruhstorfer  and  Bergstrom  (1969)  did  net  report  perfonnance  rates  in  their  EEG  sleep  stages. 
Accordingly,  it  remains  of  interest  whether  changes  in  the  .N2  and  other  features  of  the  task- 
irrelevant  evoked  response  occur  with  early,  intermittent  decline  in  performance,  or  whether 
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they  occur  only  after  subjects  cease  responding  completely. 

Some  evidence  on  this  point  has  recently  been  presented  by  Ogilvie  et  al.  (1991),  who 
presented  relatively  long  duration  (<=  5s),  low  level  (27-30  dBSL)  target  tones  at  relatively 
long  intervals  (mean  ISI  17.5s)  during  the  first  parts  of  all-night  sleep  sessions  in  which  sub¬ 
jects  were  instructed  to  respond  to  detected  tones  while  at  the  same  time  allowing  themselves 
to  fall  asleep.  Examination  of  evoked  responses  to  first  to  fourth  successive  tones  not 
responded  to  revealed  that  the  typical  auditory  sleep  response  features  (including  the  sleep  N2 
and  later  event-related  negativities  they  identify  as  K-complexes)  were  also  visible  in  part  in 
averages  of  responses  to  those  tones  to  which  their  subjects  responded  most  slowly.  How¬ 
ever,  Ogilvie  et  al.  did  not  attempt  to  quantify  the  actual  time- structure  of  fluctuations  in  alert¬ 
ness  in  their  subjects,  only  noting  that  transitions  could  be  shorter  than  the  sampling  rate  (40  s 
per  estimate)  used  in  normal  sleep  staging. 

In  the  present  experiment,  we  use  relatively  frequent  (20/min)  supra-threshold  task- 
irrelevant  auditory  probe  tones  to  evoke  brain  potentials  whose  features  correlate  with  rises 
and  falls  in  error  rate  on  a  supra-threshold  detection  task  requiring  sustained  attention.  The 
paradigm  we  use  falls  neither  into  the  mold  of  most  classical  vigilance  experiments,  in  which 
responses  to  infrequent  target  events  are  studied,  nor  does  it  follow  the  signal  detection  para¬ 
digm,  which  normally  employs  ncar-threshold  signals  and  assumes  that  operator  state  is  not 
continuously  fluctuating.  Our  experiment  is  also  unlike  most  sleep  studies  in  which  subjects 
are  instructed  to  fall  asleep.  In  our  case,  althougli  the  experimental  setting  was  conducive  to 
drowsiness,  the  subjects  were  instructed  to  attempt  to  perform  the  task  continuously  through 
half-hour  simulated  sonar  watches.  Elsewhere  (Makeig  and  Inlow,  unpublished),  we  study  in 
detail  the  spectral  structure  of  these  fluctuations  in  alertness  and  explore  their  relationship  to 
simultaneous  changes  in  the  EEG  spectrum.  Our  aim  here  is  to  quantify  the  temporal  charac¬ 
teristics  of  fluctuations  in  perfonnance  on  the  task,  and  then  to  show  that  several  features  of 
sensory  evoked  responses  to  irrelevant  auditory  probes  correlate  with  concurrent  performance 
level  as  drowsiness  or  inattention  overtakes  the  operator  and  behavioral  responses  become 
sporadic.  In  particular  we  find  that  averaged  responses  to  probes  occurring  just  before  (1)  tar¬ 
gets  responded  to  (Hits)  and  (2)  targets  not  responsed  to  (Lap.ses)  are  linearly  separable. 
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Finally,  we  discuss  possible  applications  of  this  research  to  methods  of  objectively  monitoring 
the  current  vigilance  level  of  operators. 


Method 

Subjects.  Thirteen  males  participated  in  an  auditory  simulation  of  a  passive  sonar  target 
detection  task.  Of  these,  nine  were  prospective  students  in  a  Navy  sonar  course,  three  were 
sonar  instructors,  and  one  was  from  the  laboratory  staff.  Ages  ranged  from  18  to  34  (mean  24 
years).  All  had  passed  standard  Navy  hearing  tests. 

Stimuli.  Sound  synthesis  and  data  collection  were  controlled  by  a  Concurrent  Realtime  Unix 
computer  system  using  a  12-bit  D/A  converter  sampling  at  50  kHz.  The  three  stimulus 
streams  used  in  the  experiment  are  shown  schematically  in  Figure  1.  All  stimuli  were 
presented  binaurally  through  headphones  mounted  in  isolating  cuffs  in  a  white  noise  back¬ 
ground  at  62  dB  nHL.  Task-irrelevant  auditory  probe  tones  of  two  frequencies  (568  and  1098 
Hz)  were  presented  in  random  order  with  inter-stimulus  intervals  between  2-4  s  at  72  dB 
nHL.  Probe  tones  were  50  ms  in  duration,  with  rise  and  fall  times  of  10  ms.  To  explore  use 
of  response  features  associated  with  stimulus  novelty,  the  high  tone  was  presented  more  fre¬ 
quently  (80%)  than  the  low  tone  (20%).  The  two  stimuli  will  therefore  t>e  referred  to  below  as 
Frequent  and  Rare  respectively. 

Target  noise  bursts  were  300  ms  in  duration  with  long  rise  and  fall  times  of  150  and  1 10 
ms  respectively.  Targets  occurred  in  50%  of  the  2-4  s  inter-probe  intervals,  giving  a  mean  tar¬ 
get  presentation  rate  of  10  per  minute.  Target  intensity  was  set  at  6  dB  above  its  relative 
threshold  in  the  noise.  This  intensity  was  high  enough  to  produce  initial  performance  levels  at 
or  near  ceiling,  but  was  low  enough  not  to  startle  subjects  or  delay  onset  of  alertness  decre¬ 
ments. 

Steady-state  click  probe  stimuli  were  also  presented  continuously  through  the  experimen¬ 
tal  sessions  at  a  rate  of  39  Hz  and  an  intensity  of  63  dB  nHL.  Within  the  noise  background, 
the  click  train  was  perceptible  but  not  intrusive.  It  was  presented  to  evoke  a  stable  steady-state 
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EXPERIMENTAL  STIMULI 


Figure  1.  Schematic  view  of  the  three  stimulus  streams  in  the  Experiment.  Note 
that  durations  and  amplitudes  are  not  to  scale. 

response  (SSR)  which  will  be  discussed  in  anoiher  report  also  dealing  with  changes  in  the 
EEC  spectrum. 

Procedure.  Each  subject  participated  in  two  simulated  work  sessions  of  28  minutes.  Subjects 
sat  in  a  comfortable  chair  with  eyes  clo.sed,  their  right  index  finger  resting  on  a  response  but¬ 
ton.  They  were  instructed  to  press  the  button  as  soon  as  possible  each  time  they  detected  a 
noise  burst,  and  to  ignore  the  probe  tones. 

Physiological  Recording.  Data  from  all  sessions  were  continuously  recorded  to  disk  for  off¬ 
line  analysis.  EEG  and  EOG  signals  were  amplified  5{)K  times  with  a  0. 1-100  Hz  bandwidth 
through  Grass  EEG  amplifiers,  then  multiplexed  with  button  press  information  and  converted 
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to  12-bit  digital  format  at  a  sampling  rate  of  312.5  Hz  per  channel.  EEG  was  collected  from 
13  scalp  locations  of  the  International  10-20  system.  An  ECI  Electro-Cap  provided  standard¬ 
ized  placement  of  Ag/AgCl  electrodes  at  13  sites  (Fpz,  F3,  Fz,  F4,  ^^3,  C3,  Cz,  C4,  T4,  P3, 
Pz,  P4,  Oz)  referred  to  the  right  mastoid.  A  left-to-right  mastoid  electrode  channel  was  also 
collected  and  data  was  re-referenced  to  digitally-linked  mastoids  during  averaging.  Periocular 
electrodes  were  used  to  record  electrical  potentials  generated  by  voluntary  and  involuntary 
movements  of  the  eyes  during  EEG  recording.  These  consisted  of  one  pair  of  electrodes 
placed  horizontally  at  the  outer  canthii,  and  a  second  pair  placed  one  inch  above  and  below 
the  left  and  right  eyes  respectively.  Electrical  impedances  at  all  electrode  sites  were  less  than 
five  kOhms. 

Analysis.  Responses  to  the  target  noise  bursts  were  divided  into  three  categories.  If  the  sub¬ 
ject  pressed  the  button  within  a  time  window  of  150-2000  ms  following  the  target  onset,  the 
stimulus  was  designated  a  Hit.  If  no  such  response  was  made,  it  was  labeled  a  Lapse.  Stimuli 
followed  by  inappropriate  responses  were  labeled  as  Errors  or  False  Alarms,  but  these  were  so 
few  in  number  (<6%)  that  stable  averages  of  responses  to  them  could  not  be  formed,  and  they 
will  not  be  discussed  funher.  Averaged  evoked  responses  were  computed  for  each  stimulus 
category.  To  prevent  eye  blinks  or  mu.scle  potentials  from  contaminating  recordings,  EEG 
epochs  were  excluded  during  averaging  if  potentials  at  any  site  exceeded  +  90  uV.  To  remove 
the  SSR  produced  by  the  time-locked  39/liz  click  stimuli,  before  analysis  the  averaged  data 
were  lowpass  filtered  using  a  finite-impulse  response  (FIR)  filter  with  a  cutoff  frequency  near 
32  Hz,  and  a  slope  of  12  dB  per  octave.  To  construct  a  local  measure  of  performance,  the 
times  of  occurrence  and  response  status  of  target  stimuli  were  averaged  using  a  moving- 
window  averaging  algorithm  which  maintained  a  fixed  width  (in  seconds)  and  step  size  (1.6 
s). 
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Results 


Figure  2.  Fluctuations  in  local  error  rate  for  all  26  sessions  computed  using  a  32s  moving 
window). 


Performance 

Local  error  rate.  The  variability  of  individual  session  records  is  illustrated  in  Figure  2. 
which  shows  fluctuations  in  local  error  rate  (computed  using  a  32  s  moving  window)  for  each 
of  the  26  sessions.  As  the  Figure  shows,  in  these  sessions  performance  varies  from  relativelv 
poor  to  near  perfect.  In  soi  e.  intermittent  lapses  begin  early  in  the  session  and  continue 
throughout,  while  in  one  session  (marked  with  an  asterisk),  only  four  minutes  of  good  perfor¬ 
mance  occur  before  a  transition  to  a  period  of  nearly  five  minutes  during  w'hich  the  subject 
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did  not  respond  at  all.  After  this  prolonged  absence,  performance  returns  to  near  perfect, 
marked  only  by  intermittent  lapses  every  few  minutes.  At  the  end  of  this  session,  the  subject 
reported  no  memory  of  failing  to  respond. 


Figure  3.  Mean  performance  measures  for  data  from  all  26  sessions,  (a)  Error  rate 
histogram,  (b)  Reaction  time  histogram,  (c)  Error  rate  as  a  function  of  time  on  task, 
(d)  Reaction  time  as  a  function  of  time  on  task.  Both  (c)  and  (d)  were  smoothed  using  a 
circa  50  s  window. 


Figures  3a  and  3b  show  the  performance  and  reaction  time  (RT)  histograms  across  the 
26  sessions,  each  divided  into  1024  epochs  of  1.6  seconds  each.  Local  error  rate  (in  a  32  s 
moving  window)  was  non-zero  in  35%  of  the  epochs;  in  5%  error  rate  was  100%.  Figure  3c 


shows  performance  as  a  function  of  time  on  task  averaged  over  all  26  sessions  using  a  105  s 
moving  average.  This  figure  resembles  classic  results  of  vigilance  decrement  (Mackworth, 
1970);  performance  remains  optimum  for  about  three  minutes,  then  error  rate  rises,  in  this 
experiment  reaching  a  plateau  at  10  minutes  into  the  task.  The  mean  error  rate  maximum  at 
10  minutes  is  caused  by  a  few  subjects  who  at  that  point  ceased  responding  for  some  minutes, 
then  resumed  responding.  However,  it  is  clear  from  Figure  2  that  few  of  the  individual  session 
records  closely  resemble  the  behavior  of  the  mean  shown  in  Figure  3c.  Figure  3d  shows  mean 
RT  as  a  function  of  time  on  task  (also  averaged  using  a  105  s  moving  window).  Mean  RT 
also  rises  during  the  first  pan  of  the  sessions,  then  stabilizes. 
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Figure  4.  Mean  reaction  time  averaged  as  a  function  of  local  error  rale  using  a  10% 
mo\ing  window. 


Reaction  time  versus  error  rate.  Figure  4  plots  mean  reaction  time  by  local  error  rate.  The 
plot  is  remarkably  linear  across  almost  the  entire  range  of  error  rates,  the  regressed  value 
increasing  by  257  ms  from  0%  to  100%  errors.  This  figure  suggests  that  changes  in  the  local 
error  rate  measure  are  as  meaningful  a  measure  of  performance  as  changes  in  mean  RT. 

Evoked  Responses 

Probe  Responses.  As  expected,  averaged  responses  to  the  noise  target  Hits  contained  a  prom¬ 
inent  P300  maximum  at  Pz,  while  those  to  target  Lapses  did  not.  The  target  response  data  will 
be  presented  elsewhere.  Figure  5  shows  grand  mean  ERPs  to  the  Frequent  and  Rare  task- 
irrelevant  probes  at  the  vertex,  where  these  potentials  are  largest,  selectively  averaged  over 
stimuli  which  immediately  preceded  correctly  detected  target  bursts  (Hits)  or  undetected  tar¬ 
gets  (Lapses).  Note  that  neither  the  pre-Lapse  nor  pre-Hit  responses  to  the  Rare  irrelevant 
probes  contain  a  P3  peak,  confirming  that  ’he  subjects  allocated  little  or  no  attention  to  the 
probe. 

Task-irrelevant  probe  responses  to  frequent  probe  stimuli  preceding  target  Hits  and 
Lapses  differ  in  at  least  three  ways  (see  Figure  5).  Before  Lapses,  the  prominent  P2  and  N2 
peaks  are  larger,  and  the  N1  deflection  is  smaller  than  prior  to  Hits.  The  uniformity  of  these 
changes  across  subjects  was  studied  by  means  of  analysis  of  variance  on  the  14  sessions  from 
10  subjects  including  at  least  40  Lapses  each. 

Figure  6  shows  the  difference  (pre-Lapse  minus  pre-Hit;  :  ^  iv  ee..  Frequent  N2  probe  response 
peak  amplitudes  as  a  function  of  the  number  of  sums  in  me  pre-Lapse  evoked  response.  In 
sessions  with  fewer  lapses,  the  N2  difference  was  more  variable.  This  may  have  been  because 
the  smaller  number  of  sums  made  the  signal-to-noise  ratio  of  the  pre-Lapse  ERP  small,  or  else 
because  occasional  Lapses  occurring  during  the  relatively  error-free  sessions  may  not  have 
been  associated  with  sleep  transitions  and  associated  appearance  of  the  N2,  but  rather  were 
due  to  inattention  or  rare  random  failures  of  signal  detection. 

Scalp  distribution.  As  Frequent  probe  response  averages  contained  many  more  sums  than 
Rare  probe  responses,  and  were  therefore  more  reliable,  an  anal>sis  of  variance  was  per¬ 
formed  on  responses  to  the  Frequent  probe  stimuli  alone.  At  frontal  and  central  channels  the 
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Figure  5.  Irrelevant  probe  evoked  responses.  Responses  to  Rare  and  Frequent  probe 
stimuli  occurring  before  detected  (Hit)  and  undetected  (Lapse)  targets. 

Pl-Nl  difference  was  smaller  pre-Lapse  iluui  pre-Hit  (F(l,9)  at  all  channels  >  11.2;  p<.01).  At 
left  temporal  sites,  N2  amplitude  was  significantly  larger  preceding  Lapses  than  preceding 
Hits  (F(l,9)>9.25;  p<.015),  and  P2-N2  peak  difference  was  also  larger  at  the  three  sites  Cz, 
Fz,  and  F3  (F(1,9)>1 1.8;  p<.01).  Topographic  maps  of  significance  levels  for  these  effects  are 
shown  in  Figure  7.  This  figure  was  constructed  by  performing  ANOVAs  separately  on  data 
from  each  channel,  converting  the  F-values  to  probability  levels,  and  mapping  the  results 
using  weighted  nearest-neighbor  interpolation. 
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Figure  6.  Difference  in  N2  amplitude  versus  performance.  Abscissa  is  the  number  of 
sums  included  in  the  Frequent  probe  pre-Lapse  evoked  response.  Ordinate  is  the 
difference  in  N2-to-baseline  peak  amplitude  between  pre-Hit  and  pre-Lapse  responses. 

Discriminant  Analysis 

As  a  first  test  of  the  feasibility  of  using  irrelevant  probe  ERPs  to  estimate  behavioral 
alertness,  we  attempted  to  classify  Rare  and  Frequent  averaged  probe  responses  for  sites  Fz, 
Cz,  C3,  and  C4  from  the  15  sessions  with  highest  number  of  Lapses,  categorizing  them  into 
two  groups;  those  preceding  detected  targets  (pre-Hits)  and  those  preceding  targets  missed 
(pre-Lapses).  Values  for  peak  amplitude  and  latency  of  peaks  PI,  Nl,  P2,  and  N2,  and  peak 
amplitude  differences  Pl-Nl  and  P2-N2  at  all  electrode  sites  were  used  for  initial 
classification.  The  classification  functions  were  computed  using  the  stepwise  discriminant 
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Figure  7.  Significance  maps  for  difference  in  Frequent  probe  Pl-Nl,  N2  and  P2-N2 
amplitudes  between  pre-Lapse  and  pre-Hit  responses.  Scale:  black,  p<.01;  grey,  p  <.05; 
white,  p>.05 

analysis  program  BMDP  7M.  Though  the  15  sessions  u.sed  were  from  10  subjects,  we  com¬ 
puted  statistics  on  success  of  discrimination  over  sessions,  without  regard  to  subject  in  order 
to  explore  which  ERP  sites  and  features  could  most  reliably  discriminate  our  two  classes  of 
data  from  one  another.  Initially  we  used  all  of  the  data  113  channels  *  12  measures]  as  the 
training  set.  Since  these  variables  were  highly  correlated  with  one  another,  an  exploratory 
mode  of  analysis  was  used  in  order  to  rank  order  measures  in  order  of  best  discrimination. 
Responses  were  classified  as  pre-llit  or  pre-Lapse  by  assigning  them  to  the  netirest  group 
mean  using  Mahalanobis  distance.  The  five  measures  chosen  first  by  the  initial  di.scriminant 


analysis  are  given  in  Table  1  in  the  order  selected. 


TABLE  1.  ERP  MEASURES  SELECTED  FOR  PRE-HIT 
VERSUS  PRE-LAPSE  DISCRIMINATION 


Step 

Measure 

F-to-Enter 

1. 

Frequent  N2  latency  at  Fz 

11.9 

2. 

Frequent  Pl-Nl  amplitude  difference  at  Cz 

11.5 

3. 

Frequent  N2  amplitude  at  C3 

17.9 

4. 

Rare  N2  amplitude  at  Cz 

8.2 

5. 

Frequent  N1  latency  at  C4 

7.7 

6. 

Frequent  Pl-Nl  amplitude  difference  at  Fz 

5.3 

7. 

Rare  P2-N2  amplitude  difference  at  C4 

4.3 

Using  these  seven  variables,  a  100%  discrimination  between  pre-Hit  and  pre-Lapse  responses 
was  achieved.  As  a  first  test  of  the  reliability  of  discrimination,  a  jackknife  procedure  was 
then  used  in  which  discrimination  criteria  based  on  each  combination  of  14  sessions  were 
used  to  classify  the  respon.ses  from  the  15th  session.  Only  one  classification  error  occurred, 
though  variables  chosen  varied  slightly. 

TABLE  2.  CLASSIFICATION  MATRICES  FOR  5  VARIABLE  DISCRIMINATION 


Test 

Response 

Percent 

Classification 

Correct 

pre-Lapse 

pre-Hit 

training 

pre-Lapse 

1(K).0 

15 

0 

pre-Hit 

100.0 

0 

15 

jackknife 

pre-Lapse 

86.7 

13 

2 

pre-Hit 

100.0 

0 

15 

17 


Since  our  sample  size  is  small  and  our  measures  highly  correlated,  we  tried  to  make  our 
variable  selection  more  robust  by  running  the  same  stepwise  discriminant  analyses  using  eight 
random  subsets  of  12  of  the  15  sessions.  As  before,  the  Frequent  probe  responses  dominated, 
with  only  one  Rare  probe  response  measure  being  included.  The  measures  selected  were  the 
original  top  five  except  for  "Frequent  N2  latency  at  C4,"  which  replaced  the  same  variable  at 
site  Fz.  This  substitution  demonstrates  the  difficulty  of  choosing  a  "best"  set  of  classification 
variables  when  the  variables  are  highly  correlated.  Table  2  shows  the  classification  matrices 
derived  for  the  five  selected  variables.  The  top  half  presents  the  results  from  using  the  initial 
training  set  as  the  test  (validation)  set,  and  the  bottom  half  gives  the  results  of  a  standard 
"jackknife"  procedure  in  which  each  observation  is  classified  using  classification  functions 
computed  from  the  remaining  observations.  Clearly  the  pre-Hit  and  pre-Lapse  ERPs  differ 
enough  for  successful  classification  of  operator  perfonnance. 

TABLE  3.  CLASSIFICATION  MATRICES  FOR  3  VARIABLE  DISCRIMINATION 


Test 

Response 

Percent 

Classification 

Correct 

pre-Lapse 

pre-Hit 

training 

pre-Lapse 

80.0 

12 

3 

pre-Hit 

93.3 

1 

14 

jackknife 

pre-Lapse 

80.0 

12 

3 

pre-Hit 

93.3 

1 

14 

Next,  we  ran  this  analysis  restricting  the  variables  to  the  first  three  selected:  "Frequent 
N2  latency  at  C4."  "Frequent  Pl-Nl  amplitude  difference  at  Cz,"  and  "Frequent  N2  amplitude 
at  C3."  The  resulting  jackknife  classilication  matrices  (Table  3)  were  as  accurate  as  the  train¬ 
ing  data  results,  suggesting  that  classification  functions  based  on  these  three  measures  should 
generalize  well  to  other  sessions.  Subsequent  to  this  analysis,  we  di.scovered  that  mechanical 
problems  had  interfered  with  the  recording  of  one  of  the  15  sessions  used  in  the  discriminant 
analysis.  Had  data  from  this  session  not  been  used,  the  classification  results  would  have  been 
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even  higher. 

Finally,  to  study  how  classification  performance  is  degraded  by  using  measures  of 
responses  from  a  single  site,  we  used  the  same  three  component  measures  but  restricted  them 
to  site  Cz.  The  results  (Table  3)  were  comparable  in  power  to  the  multi-channel  classification, 
suggesting  that  in  operational  use,  multi-site  recording  of  auditory  responses  might  not  offer 
significant  advantages  over  single-site  recording. 
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Figure  8.  Significance  map  for  discriminability  of  pre-Hit  and  pre-Lapse  ERPs.  F(3,26) 
transform  of  Wilks’  lambda  statistic  for  three  peak  amplitude  and  latency  measures. 
(Scale:  Black,  p<.01;  grey,  p<.05;  white,  p>.05). 

To  determine  which  site  on  the  scalp  showed  the  most  reliable  differences  between  pre- 
llit  and  pre-Lapse  responses,  we  estimated  the  spatial  distribution  of  the  classification 
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information  provided  by  our  top  three  measures  by  computing  Wilks’  lambda  for  each  site, 
suitably  transformed  to  have  an  approximate  F  distribution.  In  Figure  8  we  have  mapped  the 
approximate  F-values  at  each  recording  site.  Since  the  upper  0.001  of  the  F(3,26)  distribution 
is  the  lower  bound  of  the  Figure  scale,  it  is  clear  that  there  is  a  significant  difference  between 
the  pre-Lapse  and  pre-Hit  session  averages  at  all  sites.  The  maximum  F-value  occurs  at  or 
near  the  three  sites  Cz,  Fz,  and  F3,  indicating  that  for  the  selected  sessions,  these  measures,  in 
a  multivariate  sense,  attain  their  most  discriminable  pre-Lapse  to  pre-Hit  difference  in  this 
region.  Note  that  this  region  is  the  intersection  of  the  highly  significant  regions  for  Pl-Nl  and 
P2-N2  differences  in  Figure  7. 
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Figure  9.  Grand-mean  Frequent  irrelevant-probe  evoked  response  as  a  function  of  local 
error  rate. 
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Error-sorted  Moving  Average 


To  explore  the  dynamics  of  the  significant  changes  between  pre-hit  and  pre-Lapse 
responses,  the  entire  set  of  single  trial  responses  at  Cz  were  ordered  by  local  error  rate  com¬ 
puted  using  a  moving  window  of  32  seconds.  A  moving  average  of  this  reordered  data  was 
then  constructed  using  a  window  width  of  30%  (error  rate)  and  a  step  size  of  2%.  These 
values  were  chosen  to  yield  smoothly  varying  mean  estimates.  The  results  of  this  procedure 
are  shown  in  Figure  9.  The  top  trace  in  this  figure  shows  responses  centered  in  error-free 
epochs;  the  bottom  trace  shows  responses  in  no-response  epochs.  In  the  figure,  as  error  rate 
increases,  a  prominent  N2  appears  and  Pl-Nl  decreases. 


Figure  10.  Grand  mean  Frequent  irrelevant  probe  ERP  peak  amplitudes  versus  local 
error  rate.  Data  from  Figure  9. 


Figure  10  plots  the  amplitudes  of  grand  mean  PI,  Nl,  P2,  and  N2  peaks  versus  error 
rate.  This  figure  was  constructed  by  determining  the  latency  of  the  maximum  for  each  peak, 
and  extracting  potentials  at  these  latencies  from  the  data  of  Figure  9.  Grand  mean  N2  ampli¬ 
tude  increases  from  baseline,  at  0%  errors,  to  8  uV  at  100%  errors,  though  it  is  not  clear 
whether  there  is  any  increasing  trend  for  midrange  error  rates  from  30%  to  70%.  Grand  mean 
P2  appears  constant  until  error  rate  reaches  approximately  30%,  and  then  to  increase  in  size  as 
error  rate  rises  further.  Meanwhile  PI  amplitude  remains  stable  across  error  rates  up  to  90%, 
while  the  size  of  the  Nl  potential  decreases  nearly  monotonically  as  error  rate  rises. 
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Figure  11.  Frequent  irrelevant  probe  EKP  a.s  a  function  of  local  error  rate.  Left  panel, 
means  of  all  probes  preceding  targets.  Center  panel,  means  of  all  probes  preceding  Hits. 
Right  panel,  means  of  all  probes  preceding  Lapses. 
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Figure  11  computes  the  grand  mean  moving  averages  of  just  those  Frequent  probes 
which  immediately  preceded  noise  targets  (with  the  further  condition  that  at  least  6(X)  ms 
separated  the  noiseburst  and  probe  tone  onsets).  In  the  center  panel,  the  moving  average  of 
those  probes  preceding  Hits  is  shown,  and  in  the  right  panel,  the  moving  average  of  probes 
preceding  Lapses.  The  left  panel  shows  the  moving  average  of  both  categories.  Naturally,  this 
average  is  dominated  by  pre-Hit  epochs  at  low  error  rates,  and  by  pre-Lapse  epochs  at  high 
error  rates,  and  this  is  reflected  in  the  form  of  the  evoked  response. 


Discussion 

The  present  study  has  shown  at  least  two  features  of  auditory  evoked  responses  to  com¬ 
pletely  task-irrelevant  probe  tones  can  index  subject  alertness  without  requiring  that  the  sub¬ 
ject  attend  to  the  probe  tones  or  perform  a  secondary  task.  In  particular,  our  results  show  that 
the  auditory  N2  wave  (in  this  study  peaking  near  380  ms),  which  is  known  to  become  prom¬ 
inent  during  deeper  stages  of  sleep,  first  appears  just  as  alertness  begins  to  become  sporadic 
and,  across  subjects,  increases  in  amplitude  monotonically  and  near-Iinearly  as  frequency  of 
performance  lapses  increases.  Emergence  of  this  sleep-N2,  signaled  by  an  increased  latency 
for  the  N2  peak  measure,  was  the  measure  which  best  separated  pre-Hit  from  pre-Lapse 
responses. 

Also,  as  lapses  in  attention  became  more  frequent,  peak  N1  amplitude  steadily  decreased, 
and  this  quantity  (best  estimated  in  our  data  as  PI -NT  amplitude  difference),  also  reliably 
differentiated  mean  pre-Hit  from  pre-Lapse  responses.  This  result  complements  the  many  stu¬ 
dies  which  have  shown  that  attending  a  stream  of  sounds  enhances  N1  amplitude  (Hillyard, 
Hink,  Schwent,  and  Picton,  1973)  to  sounds  in  the  attended  stream. 

A  primary  goal  of  this  research  was  to  explore  the  possibility  that  task- irrelevant  evoked 
responses  could  be  used  to  monitor  operator  alertness.  In  ERP  research  on  performance  esti¬ 
mation,  most  attention  has  been  paid  to  use  of  the  P3(X)  response  which  is  evoked  when  a  tar¬ 
get  or  unanticipated  task-relevant  event  is  detected.  The  P3()0  may  be  useful  to  monitor 
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attention  to,  or  workload  during  machine-paced  primary  or  secondary  tasks  (Kramer,  1991). 
However,  in  many  actual  work  environments,  recording  ERPs  to  target  information  may  not 
be  possible,  and  task  irrelevant  probe  responses  appear  to  be  a  possible  replacement. 

Our  results  show  that  at  least  three  measures  of  averaged  irrelevant-probe  ERPs  appear 
to  differ  substantially  enough  to  enable  pre-Hit  and  pre-Lapse  states  to  be  discriminated.  How¬ 
ever,  note  that  we  tested  only  those  sessions  in  which  the  pre-Lapse  ERPs  were  averages  of  at 
least  40  trials.  The  averaged  responses  discriminated  in  these  procedures  were  then  averages 
of  data  collected  over  entire  half  hour  sessions.  A  method  which  detected  loss  of  alertness 
only  after  many  minutes  of  seriously  impaired  operator  capacity  would  only  partially  achieve 
timely  cognitive  status  monitoring.  To  be  of  practical  use,  a  discrimination  procedure  would 
have  to  deal  with  averages  of  fewer  sums  to  estimate  changes  in  performance  within  sessions. 

However,  our  results  do  suggest  that  single  channel  classification  performance  appears 
competitive  with  multi-channel  results,  a  result  with  practical  advantages  since  single-channel 
recording  could  be  much  easier  than  multi-channel  in  the  workplace.  The  optimum  site  for 
performance  estimation  may  be  in  the  verte.x  region  (Cz  to  Fz).  While  a  slight  left-sided  bias 
in  predictive  power  appears  in  our  N2  data,  response  state-by-side  interactions  were  not  sta¬ 
tistically  significant.  Also,  in  this  experiment,  ERPs  to  the  frequent  irrelevant  probes  provide 
most  of  the  useful  information  for  classification.  Though  this  might  be  due  to  the  larger 
number  of  frequent  probe  ERPs  available  for  averaging,  it  suggests  that  the  use  of  rare  or 
"oddball"  irrelevant  probes  may  not  give  more  robust  information  about  performance  than 
responses  to  single-frequency  irrelevant  probes  alone. 

In  our  view,  an  optimal  alertness  monitoring  system  would  not  use  probe  ERP  data 
alone.  The  EEG  averaged  to  create  evoked  responses  also  contains  much  information  about 
operator  state  which  may  not  be  reflected  directly  in  the  ERP  (Townsend  and  Johnson,  1979; 
Makeig  and  Inlow,  unpublished).  Other  physiological  phenomena,  including  heart-rate 
fluctuations,  slow  eye  movements,  and  .steady-state  response  amplitude,  may  all  provide  con¬ 
vergent  evidence  of  fluctuations  in  alertness  that  need  not  and  should  not  be  ignored  by 
designers  of  such  a  system. 
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The  statistical  methods  used  to  predict  performance  are  also  capable  of  much  improve¬ 
ment.  The  linear  discriminant  procedure  we  have  used  in  this  repon  creates  classification 
functions  which  are  optimal  only  if  the  variables  are  normally  distributed  and  the  covariance 
matrices  for  the  variables  are  the  same  for  both  groups.  It  is  likely  that  these  assumptions  are 
not  optimal  for  our  data,  and  this  fact  will  reduce  the  effectiveness  of  any  algorithm  derived 
using  linear  methods.  Discrimination  techniques  which  do  not  make  these  assumptions  and 
which  are  not  restricted  to  linear  functions  of  the  variables  (neural  network  methods  for  exam¬ 
ple)  should  be  expected  to  provide  superior  classification  speed  and  performance  (Gorman  and 
Sejnowski,  1988). 

The  discrimination  procedure  used  in  this  paper  also  attempted  to  differentiate  responses 
from  all  subjects  using  a  single  criterion.  Another,  possibly  more  successful  approach  would 
be  to  collect  pilot  data  and  develop  a  separate  non-linear  classification  algorithm  for  each  sub¬ 
ject.  We  expect  that  a  system  sufficiently  robust  for  routine  use  would  need  first  to  adapt  to 
baseline  data  for  each  operator  and  then  monitor  several  concurrent  streams  of  physiological 
information  using  non-linear  discriminant  methods.  It  remains  to  be  seen  whether,  in  such  a 
system  combining  multiple  streams  of  electrophysiological  information,  measures  of  averaged 
or  single-trial  evoked  responses  can  contribute  enough  predictive  power  to  justify  their  use. 
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